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(57) Abstract: 




PROBLEM TO BE SOLVED: To provide a converting 
method which converts a document that includes a 
figure created by a word processor, etc., into a 
document in a format that matches a user's 
document preparing/referring environment. 

SOLUTION: Descriptive format decision 203 that 
decides whether an input document 201 is described 
in an SGML (document description language) is 
performed. Except the case of the SGML, an SGML 
document 205 is created by executing common 
format that follows the syntax of the SGML, and 
when a figure is included in a document, files 209 and 
210 are created for each figure through syntax 



conversion 206 and changed (207) into a desired description format. 
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* NOTICES * 

JPO and NCIPI are not responsible for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the 
original precisely. 

2, **** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] This invention relates to the document processing system program 
which operates on a computer apparatus, and relates to the document processing system 
approach of performing the structural transition of a document, and conversion of a 
symbolic convention, especially about the document drawn up with a word processor etc. 
[0002] 

[Description of the Prior Art] The electronization of a document progressed by the spread 
of word processors, and it became reusable [ the document of editing the document drawn 
up in the past and drawing up a new document ]. However, since the word processor of 
various models existed and each model used the respectively original 
document-description format, exchange of the document data between different models 
was difficult. Although it was available from all models when it was the document of the 
format of only a simple text, exchange/playback of a document including a graph or a 
document including layout assignment were not completed. 

[0003] The standard document-description language SGML for expressing the logical 
structure of a document (ISO 8879, Information processing-Text and office 
systems-Standard Generalized Markup Language (SGML)) was proposed that this 
problem should be solved. DTD (Document Type Definition: document type definition) 
defines the set of the structure element which constitutes the structure of a document, 
and it, and SGML describes a document by it based on this. By surrounding with a tag 
shows clearly the structure element which constitutes a document. For example, the 
description of "being a </title> about the <title> conversion approach" expresses that the 
title of a document is the "conversion approach." What surrounded what surrounded the 
structure element name (this example "title") by "<" and ">" by the initiation tag, the call, 
"</", and ">" is called a termination tag. 
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[0004] As long as it is data according to a symbolic convention of a text expression like 
PostScript data, you may describe including a graph in a document. Since you cannot 
describe the image data of a binary format etc. directly in a document, refer to the file in 
which image data was stored for it using the functor of "entity declaration." In any case, 
"NOTESHON declaration" shows the symbolic convention of drawing or a table. 
[0005] Moreover, the layout information about the layout of a document is not included in 
a document. In the system which performs layout processing of a document, the layout of 
a document is performed by associating the structure element and the layout. Therefore, 
creation of the document independent of the device which draws up a document is 
attained, and it becomes reusable [ a document including a graph ]. 
[0006] 

[Problem(s) to be Solved by the Invention] Before introducing SGML, in order to build the 
system which enables reuse of a document also including the document drawn up with the 
word processor etc., it is necessary to change into a standard SGML document the 
document described in the document-description format which changes with word 
processors of various models etc., respectively. Furthermore, for the broad activity of a 
document, conversion in other formats [ document / SGML ] is also needed. 
[0007] The purpose of this invention is to offer the conversion approach which is not 
restricted to an SGML document of changing into the document of the format suitable for 
a user's document preparation / reference environment a document including the graph 
created with the word processor etc. 
[0008] 

[Means for Solving the Problem] In the document drawn up with the word processor etc., 
the character string showing graph data besides the character string showing the contents 
of a document and the specific character string showing a character string and the layout 
information about a diagrammatic display are described by the document-description 
format defined uniquely. If the specific character string contained in such a word processor 
document is transposed to the tag expression of SGML, formally, the document according 
to the functor of SGML is generable. 

[0009] By the way, in invention given in JP,7-105216,A, after analyzing document 
structure by making an SGML document into an input-statement document, a means by 
which a user can specify easily the processing for performing character string conversion 
of a structure element unit and the* structural transition of document structure is offered 
by having a means to perform processing corresponding to each structure element which 
constitutes document structure. And conversion of an SGML document is realized by 
performing processing specified about each structure element, following document 
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structure. Therefore, it becomes convertible [ the document using the above-mentioned 
method ] by considering that the document of the SGML description generated from a 
word processor document is an SGML document. 

[0010] Moreover, when a graph is included in a document, it is also necessary to extract 
drawing data as an image file, or for other applications to change and extract tabular data 
in an available format. By transposing a word processor document to SGML description, 
the specific character string which surely exists in the head and tail of graph data which 
are contained in it is transposed to a tag with the tag name showing drawing data or 
tabular data. It can be considered that the part surrounded with the tag showing drawing 
data is the structure element of drawing. Drawing data can be cut down by performing 
about this processing which outputs the contents to another file. If required, this will be 
changed into a binary format and an image file will be generated. It can be considered that 
the part which similarly was surrounded with the tag showing tabular data is the structure 
element of a table. About tabular data, since the specific character string which shows the 
information about the structure of tables, such as a ruled line location, is also transposed 
to a tag, it is expressed as front structure data with which the tabular data itself consists 
of structure elements, such as ruled line information. Therefore, since the processing 
which should be performed also about each structure element, such as ruled line 
information, can be defined, it is also easy to generate the table which stores all the 
information about the structure of a table, and to grasp the structure of a table. Moreover, 
the tabular data according to the symbolic convention of a request of a user is also 
generable by performing structure transform processing and character string transform 
processing of a structure element unit. 
[0011] 

[Embodiment of the Invention] Hereafter, the example of this invention is explained based 
on a drawing. 

[0012] Drawing 1 shows the system configuration which placed the document 
transform-processing program which changes a document on the computer connected to 
the network as an example using the document conversion method of this invention of a 
system configuration. The computer 1 connected to the network 7 consists of data files 6 
for saving the document acquired from other computers through the document and 
network 7 which are inputted as a display 2, the data entry units 3, such as a keyboard, 
CPU4, and memory 5 from a data entry unit 3. The common format-ized program 5-2 
started by memory 5 from the document transform-processing program 5-1 and the 
document transform-processing program 5-1, The document structural-analysis program 
5-3 started from the document transform-processing program 5-1, The document 
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structure storing field 5-4 for storing the document structure data with which an SGML 
document is read, and the document structural-analysis program 5-3 carries out 
structural analysis of this, and generates it, The image transformation processing program 
5-7 which changes the front structure storing field 5-5 for storing each of the tabular data 
which the document transform-processing program 5-1 extracts from the SGML document 
with which a graph is included, or image data, the image storing field 5-6, and data format 
of an image file is placed. 

[0013] Drawing 2 shows the outline of document transform processing. An 
input-statement document is taken as the document stored in portable mold media, such 
as the document and floppy disk which were created on the computer 1, and CD-ROM, or 
the document acquired through the network 7. A user specifies the symbolic convention of 
an output-statement document at the time of a document input. In document transform 
processing, the symbolic convention of an input-statement document is judged first. If an 
input-statement document is an SGML document, the document of the specified 
description will be generated by generating the document structure data of the shape of 
the tree structure as shown all over drawing, and performing structural transition and 
symbolic-convention conversion to this document structure data. About documents other 
than an SGML document, it changes into the common formal document of SGML 
description first. It is possible to transpose the instruction statement which specifies the 
layout of centering of a character string included as an approach for changing into SGML 
description in the document drawn up with the word processor etc. to a tag expression. 
Structural transition and symbolic-convention conversion are performed like an SGML 
document by generating a common formal document by such approach, and considering 
that this is an SGML document. Since the document structure divided into the part of 
character strings other than a graph, the part of drawing, and the part of a table is 
generated when a graph is included in a document, it is also possible to output only the 
part of drawing as an image file, or to output only the part of a table as a front data file. 
[0014] Drawing 3 shows the flow chart of a document transform-processing program. The 
symbolic convention of an input-statement document is judged at step 301. A symbolic 
convention can be easily judged by referring to a head part, as for many of documents 
drawn up with a word processor etc., since the symbolic convention is specified by the 
head part of document data. When an input-statement document is not an SGML 
document as a result of a judgment, common format-ized processing in which document 
data are changed into the common formal document of SGML description at step 303 is 
performed. The text document described according to the functor of LATEX shown in 
drawing 4 is explained to a detail about common format-ized processing of step 303 as an 
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example of an input-statement document. A LATEX document starts in Vdocumentstyle 
{...} and consists of a character string showing the contents of a document, and 
instruction statement about the layout of a document. The instruction statement (for 
example, ¥title) which starts in ¥ is connected with layout information, such as 
arrangement for arranging a document, a font, and a character size, except for special 
instruction statement (for example, Vdocumentstyle {jreport}), the appointed layout is 
applied by LATEX to the character string enclosed in the braces ({-- }) which continues 
after instruction statement. Although there is a character string which is not enclosed in 
instruction statement and a braces, i.e., the character string to which instruction 
statement is not given, in such a case, the layout using standard arrangement, a font, and 
a character size is applied. In common format-ized processing, the expression of 
instruction statement is transposed to the expression of a tag about a document like 
drawing 4 . for example, -- a title — a part — "-- ¥ - title -- {-- ODA — having been based 
„ ******** „ "„ ¥ „ tjt | e „ { „ » „ < „ tjt | e > replacing - after that a 

character string — continuing — "--} — " < — /— title -- > replacing -- things — "-- < 

— title>ODA — having been based -- < — /-- title — > — " ** -- saying — description — 
generating . the same — a chapter -- a title — a part — "— ¥ chapter — {--} — " — a knot 

- a title — a part — "-- ¥ — section {--} — " — respectively — "-- < — chapter — > — 
< — /-- chapter — > — " — "-- < — section -->--< — /— section — > — " — replacing . 
About the character string to which instruction statement which appears after a chapter 
title or a knot title is not given, it considers that this is a paragraph and the tag <para> 
showing a paragraph is added to the head and tail of a character string. Moreover, the tag 
(<doc>) which expresses initiation of a document and termination to the document itself 
is added to the head of a document, and a tail. By performing such processing, a common 
formal document like drawing 5 R> 5 is obtained. 

[0015] Next, at step 304, it considers that the common formal document generated at the 
inputted SGML document or step 303 is an SGML document, document syntax analysis is 
performed, and tree structure-like document structure data are generated. In processing 
of this step 304, document structure data as shown in drawing 6 are generated by making 
a common formal document as shown in drawing 5 into an input-statement document. 
Processing after 305 steps shall be performed about the document structure data 
generated at 304 steps, and a method given in JP,7-105216,A shall perform assignment of 
transform processing about each structure element contained in document structure, and 
activation of those transform processing. 

[0016] In step 305, when it judges whether drawing is included or not and drawing is 
included in a document, image data generation processing in which the part of drawing is 

Page 7 



extracted as an image data file is performed at step 306. In step 307, when it judges 
whether a table is included or not and a table is included in a document, at step 308, the 
structure of a table is analyzed and tabular data generation processing which generates 
description of the table according to the specified output form is performed. 
[0017] At step 309, since the document of the appointed format is outputted, the 
structural transition of document structure data, such as replacing the sequence of 
removal and a structure element for a specific structure element, is performed. For 
example, since the layout information included in a common formal document becomes 
unnecessary in changing a common formal document like drawing 5 into an SGML 
document, the structure element about layout information is removed from document 
structure data as shown in drawing 6 , a structure element name is changed if needed, and 
it changes into hierarchical document structure like the "report" shown in drawing 7 . 
[0018] At step 310, the document of the appointed output form is generated by changing 
and outputting to the symbolic convention which had the character string specified about 
the document structure data after structural transition. For example, an SGML document 
like drawing 8 can be outputted by repeating a character string output according to the 
contents of each structure element, following document structure data like drawing 7 . 
[0019] Drawing 9 shows the flow of the image data generation processing shown in step 
306 of drawing 3 . In SGML, there are an approach of describing in a document only the file 
name of the image data file (it considers as an image file hereafter.) which exists on a data 
file 6 as the description approach of the drawing data (image data) contained in a 
document, and the approach of describing in a document the image data of text format 
which changed the image data of a binary format into the text expression. In image data 
generation processing, about the document with which the image data of text format is 
described in the document, the image data embedded into the document is extracted as 
an image file, and the image file name is written in into a document. Therefore, it is not 
necessary to be image data generation processing about that the image file name is 
described to be in the document from the first. Hereafter, image data generation 
processing is explained to a detail. The specific character string which shows initiation of 
image data and termination exists in the head and tail of image data which are contained 
in a document. Therefore, if the document containing the image data of the bit map format 
changed into the text expression is changed into a common format, description like 
drawing 10 will be obtained. If the tag showing drawing is set to <PICTURE>, the bit map 
data changed into the text expression will be surrounded with the <PICTURE> tag and a 
</PICTURE> tag. By carrying out structural analysis of such description, document 
structure data with a bit map data character string are generated as a child of a PICTURE 
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structure element like drawing 11 . A data storage format is easily acquired by the data 
storage format's being described by the head of image data as information (it considering 
as image header information hereafter.) about image data, therefore generally, reading 
image header information. So, at step 3062, the header information included in the bit 
map data character string changed into the text expression is read, and the data storage 
format of drawing is acquired. At step 3064, the file name for storing only drawing data is 
generated. It outputs as a text file with the file name generated at step 3064 by step 3066 
in the bit map data character string which is the child of drawing (PICTURE). At step 3068, 
the child (bit map data character string) of drawing (PICTURE) is transposed to the file 
name generated at step 3064. At step 3070, data format is changed about the drawing file 
outputted at step 3066. For example, binary conversion is performed about the text file 
outputted at step 3066 about text-ized bit map data like drawing 10 , and a bitmap file is 
generated. Furthermore, conversion in other image data storage formats is performed 
using the image transformation program 5-6 if needed. 

[0020] Drawing 12 shows the flow of the tabular data generation processing shown in step 
308 of drawing 3 . The example of the table set as the object of tabular data generation 
processing is shown in drawing 13 . If LATEX describes the table of drawing 13 , it can 
describe like drawing 14 . Hereafter, description of drawing 1414 is explained. The first 
¥begin {tabular} expresses initiation of front description. It specifies that are the 
parameter which specifies the attribute of the line of a table, one line consists of three eels, 
and {|c|c|c|} following it centers a character string for between each eel in a break and 
each eel by the vertical ruled line. It can mean that ¥hline and ¥cline draw a horizontal 
ruled line in that location, Vhline can draw a ruled line in all the eels contained in a line, and 
¥cline {2-3} can specify the range of the eel which draws a ruled line with a parameter 
(this example {2-3}). Moreover, & expresses the break location between the eels of a 
table, and ¥¥ expresses line feed. The last ¥end{tabular} expresses termination of front 
description. 

[0021] Front description as shown in drawing 14 is changed into a common format like 
drawing 15 by common format-ized processing shown in step 303 of drawing 3 . 
Document structure data like drawing 16 are generated by performing document 
structural-analysis processing shown in step 304 of drawing 3 about description of 
drawing 15 . In tabular data generation processing, processing which generates grasp of 
front structure and desired front description is performed for document structure data like 
drawing 16 . First, at step 3082, a front structure table as shown in drawing 17 based on 
this document structure data is generated, and the character string in the ruled line 
information about all the eels contained in a table and a eel is written in the front structure 
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table. The number of integrated eels shall be storable in the ruled line information in a 
table about integration of the eel which adjoins the existence, its position coordinate, 
lengthwise direction, or longitudinal direction of a ruled line of the four directions 
surrounding a eel as information required in order to detect integration of a cel. 
[0022] About document structure data like drawing 16 P the ruled line information on each 
eel is written in based on the structure elements hline and cline about the ruled line of a 
table, "hline" draws the bottom ruled line of all the eels contained in the line, and it serves 
as an upper ruled line of all the eels contained in coincidence at the following line. In order 
that "cline" (for example, suppose that it has an attribute "2-3".) may draw a bottom ruled 
line only in the 2 or 3rd eel of the line, an upper ruled line will exist only in the 2 or 3rd eel 
also about the following line. 

[0023] Integration of the lengthwise direction between eels / longitudinal direction is 
detected based on such ruled line information after write-in termination of the ruled line 
information on all eels. If the table of drawing 13 is taken for an example, the eel at the left 
end of eye one train and the eel at the left end of eye two trains are unified by the 
lengthwise direction. That is, the eel at the left end of eye one train is an integrated 
initiation eel of a lengthwise direction, and this can be detected in a front structure table 
as a eel in which an upper ruled line exists and a bottom ruled line does not exist. If the 
integrated initiation eel of a lengthwise direction is detected, if there is a bottom ruled line 
with reference to the ruled line information on the eel which adjoins the lengthwise 
direction of the following train, it will be made into an integrated termination eel, and if 
there is no bottom ruled line, it will be regarded as what integration of a lengthwise 
direction follows further. Integration of the eel of a lengthwise direction follows an adjacent 
eel in order, it continues until it arrives at the eel in which a bottom ruled line exists, and 
it uses as an integrated termination eel the eel in which a bottom ruled line exists. The 
number of the eels from an integrated initiation eel to an integrated termination eel is 
written in as the number of lengthwise direction integration in the ruled line information 
about an integrated initiation cel. By paying one's attention to the right ruled line of a eel 
similarly about lateral integration, the number of integration is detected and it writes in as 
the number of longitudinal direction integration. However, in LATEX, since the number of 
integration can be described as a parameter of instruction statement Vmulticolumn about 
integration of the longitudinal direction of a eel, the number of integration is easily 
obtained from a parameter. A table like drawing 18 is described by LATEX like drawing 19 . 
[0024] If front structure table generation of step 3082 is completed, the front structural 
transition which changes front structure like drawing 16 into the structure doubled with 
the output form at step 3084 will be performed. For example, in outputting the document 
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of a HTML format, it changes into structure like drawing 20 suitable for the functor of HTML. 
At step 3086, HTML description like drawing 21 is outputted by outputting a character 
string, following such the tree structure. 
[0025] 

[Effect of the Invention] According to this invention, the document of the common format 
of SGML description is generated by transposing the specific character string showing 
layout information to the tag expression of SGML about a word processor document. By 
considering that this is an SGML document and processing it, modification of the symbolic 
convention of a document and various conversion of the document of taking out the graph 
included in a document can be easily performed now. Therefore, it becomes exchange of 
the document which does not ask a model, and reusable. 



Page 11 



* NOTICES * 

JPO and NCIPI are not responsible for any 
damages caused by the use of this translation. 

1. This document has been translated by computer. So the translation may not reflect the 
original precisely. 

2. **** shows the word which can not be translated. 
3. In the drawings, any words are not translated. 



CLAIMS 



[Claim(s)] 

[Claim 1] A means to transpose the specific character string showing the layout 
information included in a document to the tag expression of SGML, and to generate the 
document of the common format of SGML description, A means to analyze the document 
structure about the SGML document of arbitration, and a means to specify the processing 
which should be performed to the structure element of the arbitration which constitutes 
this SGML document, In the document inverter equipped with a means to perform 
processing specified as each structure element The document of the common format 
which permuted the specific character string showing layout information by the tag 
expression of an SGML format about said document is generated. The document 
conversion approach of changing a document by performing processing which analyzed 
document structure like the common SGML document, and was beforehand specified 
about each structure element in this common formal document. 
[Claim 2] The document conversion approach of generating an image file and 
changing data format of an image file by changing said drawing data into a binary format 
according to the document format of a conversion place if a part for said drawing data 
division is started, it outputs to another file and there is need about the document 
containing the drawing data by which the text expression is carried out in the document 
conversion approach according to claim 1 according to the symbolic convention defined 
beforehand. 

[Claim 3] The document conversion approach which generates the front description 
which has grasped front structure based on tabular data, and followed it at the desired 
symbolic convention by generating the table which stores the information about the 
structure of a table about the document containing the tabular data by which the text 
expression is carried out in the document conversion approach according to claim 1 
according to the symbolic convention defined beforehand. 
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[Claim 4] The document conversion approach which generates the document from 
which at least one of the document with which the graph created beforehand is included in 
the document conversion approach according to claim 1 to 3, drawing, and tables was 
removed. 



[Translation done.] 
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